an R package for data visualization
implements the “grammar of graphics”
Popular (well-supported, great community)
Open source (like all of R)
Easy to use (after a learning curve)
Aesthetically pleasing
Built for multi-variate data
Reproducible figures
http://spatial.ly/2013/12/introduction-spatial-data-ggplot2/
http://chrisladroue.com/2012/02/polar-histogram-pretty-and-useful/
https://twitter.com/bettermeasured/status/885956556046643200
Teach you enough that you know how to teach yourself more!
Teach you enough that you know how to teach yourself more!
Introduce “grammar of graphics” concepts
Send you away with a list of resouces
ggplot(data = diamonds, aes(x = carat, y = price, color = clarity)) +
geom_point() +
facet_grid(color ~ cut)Download scripts from the following site: https://is.gd/IYoXwA
Open RStudio
In RStudio, open “install_packages.R”. Highlight the text and click “Run”.
Still in RStudio, open “workshop_script.R”. We’ll work from this for the rest of the presentation.
First, we need data.
Let’s use the built-in R dataset, “diamonds”.
| carat | cut | color | clarity | depth | table | price | x | y | z |
|---|---|---|---|---|---|---|---|---|---|
| 0.23 | Ideal | E | SI2 | 61.5 | 55 | 326 | 3.95 | 3.98 | 2.43 |
| 0.21 | Premium | E | SI1 | 59.8 | 61 | 326 | 3.89 | 3.84 | 2.31 |
| 0.23 | Good | E | VS1 | 56.9 | 65 | 327 | 4.05 | 4.07 | 2.31 |
| 0.29 | Premium | I | VS2 | 62.4 | 58 | 334 | 4.20 | 4.23 | 2.63 |
| 0.31 | Good | J | SI2 | 63.3 | 58 | 335 | 4.34 | 4.35 | 2.75 |
| 0.24 | Very Good | J | VVS2 | 62.8 | 57 | 336 | 3.94 | 3.96 | 2.48 |
ggplot(data = diamonds)ggplot(data = diamonds,
aes(x = carat, y = price))ggplot(data = diamonds,
aes(x = carat, y = price)) +
geom_point()ggplot(data = diamonds,
aes(x = carat, y = price, color = clarity)) +
geom_point()ggplot(data = diamonds,
aes(x = carat, y = price), color = clarity) +
geom_point()ggplot(data = diamonds,
aes(x = carat, y = price)) +
geom_point(color = "magenta")(with a small difference:)
ggplot(data = diamonds,
aes(x = carat, y = price)) +
geom_point(aes(color = clarity))ggplot(data = diamonds,
aes(x = carat, y = price)) +
geom_point(aes(color = clarity)) +
geom_smooth()ggplot(data = diamonds,
aes(x = carat, y = price)) +
geom_point(aes(color = clarity)) +
geom_smooth(color = "black", size = 0.8, linetype = 2)ggplot(data = diamonds,
aes(x = carat, y = price)) +
geom_point(aes(color = clarity)) +
geom_smooth(color = "black", size = 0.8, linetype = 2) +
theme_few()ggplot(data = diamonds,
aes(x = carat, y = price)) +
geom_point(aes(color = clarity)) +
geom_smooth(color = "black", size = 0.8, linetype = 2) +
facet_wrap(~cut)ggplot(data = diamonds, aes(x = carat, y = price)) +
geom_point(aes(color = clarity), size = 0.5) +
facet_grid(color~cut)ColorBrewer is useful and popular:
ggplot(data = diamonds, aes(x = carat, y = price)) +
geom_point(aes(color = clarity)) +
theme_few() +
scale_color_brewer(type = "qual", palette = "Set2")ggplot(data = diamonds,
aes(x = clarity, y = price)) +
geom_violin() ggplot(data = diamonds,
aes(x = price, y = cut)) +
geom_joy()ggplot(data = diamonds,
aes(x = price, y = cut, color = cut, fill = cut)) +
geom_joy(alpha = 0.6, scale = 5) +
scale_fill_viridis(option = "A", discrete = TRUE) +
scale_color_viridis(option = "A", discrete = TRUE) +
theme_few()ggplot(data = diamonds,
aes(x = price, y = cut, fill = cut)) +
geom_joy(alpha = 1, scale = 5) +
scale_fill_manual(values = wes_palette("Darjeeling")) +
theme_few()kable(head(iris))| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
iris_long <- melt(iris, id.vars = ("Species"))
kable(head(iris_long))| Species | variable | value |
|---|---|---|
| setosa | Sepal.Length | 5.1 |
| setosa | Sepal.Length | 4.9 |
| setosa | Sepal.Length | 4.7 |
| setosa | Sepal.Length | 4.6 |
| setosa | Sepal.Length | 5.0 |
| setosa | Sepal.Length | 5.4 |
ggplot(iris_long, aes(x = Species, y = value, fill = variable)) +
geom_bar(stat = 'identity', width = 1) +
theme_bw()ggplot(iris_long,
aes(x = Species, y = value, color = variable, fill = variable)) +
geom_bar(stat = 'identity', width = 1) +
coord_polar(theta = 'x') +
theme_bw()kable(head(gapminder))| country | continent | year | lifeExp | pop | gdpPercap |
|---|---|---|---|---|---|
| Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.4453 |
| Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.8530 |
| Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.1007 |
| Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.1971 |
| Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.9811 |
| Afghanistan | Asia | 1977 | 38.438 | 14880372 | 786.1134 |
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point()Who’s the outlier?
gapminder %>% filter(gdpPercap > 60000)## # A tibble: 5 x 6
## country continent year lifeExp pop gdpPercap
## <fctr> <fctr> <int> <dbl> <int> <dbl>
## 1 Kuwait Asia 1952 55.565 160000 108382.35
## 2 Kuwait Asia 1957 58.033 212846 113523.13
## 3 Kuwait Asia 1962 60.470 358266 95458.11
## 4 Kuwait Asia 1967 64.624 575003 80894.88
## 5 Kuwait Asia 1972 67.712 841934 109347.87
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_hex()ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_density2d(aes(color = ..level..), bins = 20) +
scale_color_viridis()p <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point(aes(color = continent, size = pop), alpha = 0.8) +
scale_x_continuous(trans = 'log') +
facet_wrap(~year) +
scale_color_brewer(type = "Qual", palette = "Accent") +
theme_hc(bgc = 'darkunica') +
theme(text = element_text(size = 9))pPrepare data:
country_df <- map_data('world') %>%
rename("country" = "region")
country_df$country[country_df$country == "USA"] <- "United States"
#Take the mean across all years for each country:
gapminder_means <- gapminder %>%
group_by(country, continent) %>%
summarise(lifeExp = mean(lifeExp),
pop = mean(pop),
gdpPercap = mean(gdpPercap))
plot_dat <- left_join(gapminder_means, country_df, by = "country")ggplot(plot_dat) +
geom_polygon(aes(x = long, y = lat, fill = lifeExp, group = group)) +
scale_fill_viridis(option = "A") +
coord_quickmap() +
theme_few()What questions could we ask with this data?
How could we visually answer those questions?
Thanks!
Adrienne Marshall mars7850@vandals.uidaho.edu